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Abstract Mining the silent members of an online com¬ 
munity, also called lurkers, has been recognized as an 
important problem that accompanies the extensive use 
of online social networks (OSNs). Existing solutions to 
the ranking of lurkers can aid understanding the lurk¬ 
ing behaviors in an OSN. However, they are limited 
to use only structural properties of the static network 
graph, thus ignoring any relevant information concern¬ 
ing the time dimension. Our goal in this work is to 
push forward research in lurker mining in a twofold 
manner: (i) to provide an in-depth analysis of tempo¬ 
ral aspects that aims to unveil the behavior of lurkers 
and their relations with other users, and (ii) to enhance 
existing methods for ranking lurkers by integrating dif¬ 
ferent time-aware properties concerning information- 
production and information-consumption actions. Net¬ 
work analysis and ranking evaluation performed on Fli- 
ckr, FriendFeed and Instagram networks allowed us to 
draw interesting remarks on both the understanding of 
lurking dynamics and on transient and cumulative sce¬ 
narios of time-aware ranking. 
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Lurking is a widely common behavior in online users, 
which is usually associated with definitions of nonpar¬ 
ticipation, infrequent or occasional posting and, more 
generally, with observation, and bystander behavior |451 
|49] . As a fundamental premise, it should be noted that 
lurkers should not be trivially regarded as totally in¬ 
active users, i.e., registered users who do not use their 
account to join an online community; rather, a lurker 
can be perceived as someone who gains benefit from 
other’s information and services without significantly 
giving back to the online community. 

The main general reasons behind the multifaceted 
nature of this kind of user behavior are well explained 
in social science, based on various motivational factors, 
such as environmental, commitment, quality require¬ 
ments, and individual factors |54j . In general, lurkers 
represent an enormous potential in terms of social cap¬ 
ital, because they acquire knowledge from the online 
community but never or rarely let other people know 
their opinions. Lurking can indeed be expected or even 
encouraged because it allows users to learn or improve 
their understanding of the etiquette of an online com¬ 
munity before they can decide to provide a valuable 
contribution over time |22j . Within this view, a major 
goal is to de-lurk those users, i.e., to encourage lurkers 
to more actively participate in the online community 
life: indeed, even though a proper amount of lurkers 
is acceptable for a large-scale social environment, too 
many individuals of that kind would impair the virality 
of the online community. 

However, a complete characterization of lurkers has 
represented a controversial issue in social science and 
human-computer interaction research |22] . which has 
consequently posed several challenges in (quantitatively) 
analyzing lurking in online social networks (OSNs). De¬ 
spite the fact that lurkers represent the large majority 
of members in an OSN, little research in computer sci¬ 
ence has been done that considers lurking as a valid and 
worthy-of-investigation form of online behavior. In |561 
[55] , we fill a lack of knowledge on the opportunity of 
analyzing lurkers in OSNs, and on the important im¬ 
plications that the detection of lurkers can have on a 
deeper understanding of the feelings in an online com¬ 
munity. We addressed the previously unexplored prob¬ 
lem of ranking lurkers in an OSN, by introducing a 
topology-driven lurking definition and proposing a com¬ 
putational framework that offers various solutions to 
the ranking of lurkers. However, a limitation of the 
study in is that it does not deal with temporal 

information to enhance the understanding and ranking 
of lurkers in an OSN. 
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Online social environments are highly dynamic sys¬ 
tems, as individuals join, participate, attract, cooper¬ 
ate, and disappear across time. This clearly affects the 
shape of the network both in terms of its social (fel¬ 
lowship) and interaction graphs [STlIMlIBSlITTlIHTlISni^ 
[5]. Moreover, everybody agrees on the stance that users 
normally look for the most updated information, there¬ 
fore the timeliness of users and their relations become 
essential for evaluation [SS1IZ1I1S1IM1E2 • Like any other 
user, lurkers as well may be interested not only in the 
authoritative sources of information, but also in the 
timely sources. 

Research on temporal network analysis and mining 
strives to understand the driving forces behind the evo¬ 
lution of OSNs and what dynamical patterns are pro¬ 
duced by an interplay of various user-related dimensions 
in OSNs. Dealing with the temporal dimension to mine 
lurkers appears to be even more challenging. Yet, it’s 
also an emergent necessity, as users in an OSN natu¬ 
rally evolve playing different roles, showing a stronger 
or weaker tendency toward lurking at different times. 
Moreover, as temporal dimension in an OSN is gener¬ 
ally examined in terms of online frequency of the users, 
it’s important to take into account that lurkers may 
have unusual frequency of online presence as well as 
unusual frequency of interaction with other users. 

Contributions. Our contributions in this work are twofold. 
First, we provide insights into the understanding of 
lurkers from various perspectives along the time di¬ 
mension in an OSN environment. We conduct different 
stages of temporal analysis of lurking behaviors, focus¬ 
ing on two macro aspects: how lurkers relate to other 
types of users in the network, and how patterns of lurk¬ 
ing behaviors evolve over time. 

Second, we overcome the time-related limitation of 
previous formulations of lurker ranking methods. To 
this purpose, we model different temporal aspects con¬ 
cerning both the production and consumption of in¬ 
formation, by introducing novel measures of freshness 
and activity trendy at user and at user-interaction level. 
These measures are key ingredients in the proposed 
time-aware lurker ranking methods, for which we de¬ 
velop two approaches: a time-transient based ranking 
approach, which is restricted to a particular snapshot 
graph of the network, and a time-cumulative based rank¬ 
ing approach, which encompasses a sequence of snap¬ 
shots based on a time-evolving definition of freshness 
and activity functions. 

We structure our work into seven research questions 
(Qi - Q7), which are summarized as follows. 

— Lurking is often related to inactive behavior or to in¬ 
experienced usage of the network services at a given 


time. Therefore, we aim at unveiling whether and 
to what extent there exists any correspondence be¬ 
tween lurkers and zero-contributors (Ql), and be¬ 
tween lurkers and newcomers (Q2). From a differ¬ 
ent perspective, we want also to understand whether 
lurkers create preferential relations with active users 
(Q3). 

— Responsiveness^ i.e., the willingness of a user to re¬ 
spond to other users, is a key criterion to measure 
behavioral dynamics of users in an OSN. We are 
hence interested in quantifying how frequently lurk¬ 
ers react to the postings of other users (Q4). 

— We investigate how lurking trends evolve over time 
and how these can be characterized using a clus¬ 
tering framework (Q5). Moreover, by involving also 
the content dimension, we analyze the topical usage 
behaviors of lurkers and their topic-sensitive evolu¬ 
tion patterns (Q6). 

— We assess the ability of our proposed time-aware 
lurker ranking algorithms in providing improved so¬ 
lutions to the lurker ranking problem. We evaluate 
the impact of the proposed time-transient and time- 
cumulative based approaches on the ranking perfor¬ 
mance, and also compare them with a state-of-the- 
art time-aware ranking algorithm m (Q7). 

Plan of the paper. The remainder of this paper is orga¬ 
nized as follows. Sectionj^provides first a short overview 
of our early work on lurker detection and ranking in 
OSNs, then focuses on our proposal of time-aware Lurker- 
Rank methods. We answer to each of the above stated 
research questions in Section Section discusses re¬ 
lated work, and Section concludes the paper. 

2 Time-aware Lurker Ranking 

2.1 LurkerRank at a glance 

In [561l55j . we developed the first formal computational 
methodology for lurker detection and ranking. We pro¬ 
vided well-principled definitions of lurking, introduced 
a network graph model oriented to the analysis and 
mining of lurkers, and defined methods to search and 
rank lurkers in an OSN. 

Our initial definition of lurking relies solely on the 
topology information available in a OSN, modeled as 
a fellowship graph. Upon the assumption that lurk¬ 
ing behaviors build on the amount of information a 
user receives, our key intuition is that the strength of a 
user’s lurking status can be determined based on her/¬ 
his in/out-degree ratio (i.e., followee-to-follower ratio), 
and of her/his neighborhood. We report next the topol¬ 
ogy-driven lurking definition from |56j : 
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Definition 1 (Topology-driven lurking) Let Q = 

(V,£) denote the directed graph representing an OSN, 
with set of nodes (users) V and set of edges £, whereby 
the semantics of any edge {u, v) is that v is consum¬ 
ing information produced by u. A node v with infinite 
in/out-degree ratio (i.e., a sink node) is trivially re¬ 
garded as a lurker. A node v with in/out-degree ratio 
not below 1 shows a lurking status, whose strength is 
determined based on: 

Principle I: Overconsumption. The excess of informa¬ 
tion-consumption over information-production. The 
strength of u’s lurking status is proportional to its 
in/out-degree ratio. 

Principle II: Authoritativeness of the information re¬ 
ceived. The valuable amount of information received 
from its in-neighbors. The strength of v’s lurking 
status is proportional to the influential (non-lurking) 
status of the u’s in-neighbors. 

Principle III: Non-authoritativeness of the information 
produced. The non-valuable amount of information 
sent to its out-neighbors. The strength of u’s lurking 
status is proportional to the lurking status of the u’s 
out-neighbors. 

The above principles form the basis for three rank¬ 
ing functions that differently account for the contribu¬ 
tions of a node’s in-neighborhood and out-neighborhood. 
We finally provided a complete specification of our lurker 
ranking models in terms of PageRank-style methods. 
For the sake of brevity here, and throughout this pa¬ 
per, we will refer to only one of the formulations de¬ 
scribed in [5S1I55] . which is that based on the full m- 
out-neighhors-driven lurker ranking, hereinafter dubbed 
simply as LurkerRank. 

Given a node u G V, let us denote with Bu and 
the set of in-neighbors (i.e., backward nodes) and the 
set of out-neighbors (i.e., reference nodes) of u, respec¬ 
tively. The in-degree and out-degree of u are denoted 
as in{u) = \Bu\ and out(u) = respectively. The 
following formula gives the LurkerRank LR{v) for any 
node V. 


LR{v) = d[£in{v) (1 + /:out(t^))] + (1 - d)/(|V|) (1) 


where /lin(u) is the in-neighbors-driven lurking func¬ 
tion: 


£in{v) 


1 

out{v) 


E 


out(u) 

in{u) 


LR{u) 


( 2 ) 


and £out('c) is the out-neighbors-driven lurking func¬ 
tion: 


-^out ('c) 


in(v) 

Mu) 


E 

u£R.u 


in{u) 

out(u) 


LR{u) 


(3) 


Moreover, d is a damping factor ranging within [0,1], 
usually set to 0.85. To prevent zero or infinite ratios, the 
values of m(-) and out{-) are Laplace add-one smoothed. 


2.2 Time-aware LurkerRank methods 

In this section we describe our extensions to Lurker¬ 
Rank that account for the temporal dimension when 
determining the lurking scores of users in the network. 
We follow two approaches based on different models of 
temporal graph: 

• Transient ranking, i.e., a measure of a user’s lurk¬ 
ing score based on a time-static (snapshot) graph model; 

• Cumulative ranking, i.e., a measure of a user’s 
lurking score that encompasses a given time interval 
(sequence of snapshots), based on a time-evolving graph 
model. 

The building blocks of our methods rely on the spec¬ 
ification of the temporal aspects of interest, namely 
freshness and activity trend, both at user and at user 
relation level. Freshness takes into account the time- 
stamps of the latest information produced (i.e., posted) 
by a user, or the timestamps of the latest information 
consumed by a user in relation to another user’s action. 
Activity trend models how the users’ posting actions or 
the responsive actions vary over time. These concepts 
will be elaborated on in the next section. 


2.2.1 Freshness and activity trend functions 

Users in the network are assumed to perform actions 
and interact with each other over a timespan T C T. 
The temporal domain T is conveniently assumed to be 
N. Therefore, the time-varying graph of an OSN is seen 
as a discrete time system, i.e., the time is discretized at 
a fixed granularity (e.g., day, week, month). 

Freshness. Let T C T be a temporal subset of interest, 
being in interval notation of the form T = [tg, U] > with 
tg < te- For any time t, we define the freshness function 
iprit) as: 


Frit) 


l/log2(2-h (te - t)), iftGT 
0, otherwise. 


(4) 


Function (firit) ranges within [0, Ij. Note that we opt for 
a function with logarithmic decay to ensure, as (tg ~ t) 
gets larger, a slower decrease w.r.t. other decreasing 
functions with values in (0,1]—for instance, the graph 
of iprit) lies always above the graph of 2/(1 -|- exp(te — 
t)), or of 1/(1 -I- {te - t)). 








4 


Andrea Tagarelli, Roberto Interdonato 


Given a user m, let T„ be the set of time units at 
which u performed actions in the network. The fresh¬ 
ness of M at a given temporal subset of interest T is 
defined as: 

/t(w) = max{i^r(t), t G Tu s.t. G < t < te} (5) 

Note that /t(u) is always defined and positive, for all 
t G T. Higher values of /t(u) correspond to more recent 
activities of u w.r.t. T. 

Activity trend. The second aspect we would like to 
understand is the activity trend of a user. Let us first 
denote with 

Su — [(^li G)j • ■ ■ j ^n)] (b) 

the time series representing the activity of user u over 
T„. For every pair (x, t) G Su, x denotes the number of 
It’s actions at time t. 

In order to model the temporal evolution of the ac¬ 
tivity of a user u, we employ the Derivative time series 
Segment Approximation (DSA) [55] and apply it to the 
user’s activity time series Su - DSA is able to represent a 
time series into a concise form which is designed to cap¬ 
ture the significant variations in the time series profile. 
For any given time series Su of length n, DSA produces 
a new series r of h values, with h n. The main steps 
performed by DSA are summarized as follows: 

— Step 1 - Derivative estimation: Su is transformed 
into Su, where each value x G Su is replaced by its 
first derivative estimate. 

— Step 2 - Segmentation: the derivative time series S!^ 
is partitioned into h variable-length segments. Each 
of the segments aggregates subsequent data values 
having very close derivatives, i.e., it represents a 
subsequence of values with a specific trend. 

— Step 3 - Segment approximation: each of the seg¬ 
ments in S'u is mapped to an angular value a, which 
collapses information on the average slope within 
the segment. 

The DSA series t„ is of the form t„ = [( 01 ,^ 1 ), ... 

... ,{ah,th), such that aj = arctan(/i(sj)) and tj = 
tj-i -\- Ij, with j = [l..h], where Sj is the j-th segment, 
Ij its length, and pt{sj) the mean of its points. 

As a post-processing step, the values aj of the DSA 
sequence Tu are normalized within [0,1] by deriving the 
values dj = aj/TT-\-l/2. In this way, an increasing (resp. 
decreasing) trend of activity will correspond to a value 
within (0.5,1] (resp. [0,0.5)). Therefore, we define the 
activity trend of user u (over the whole interval Tu) as 
the time sequence: 

(7) 


Given a temporal interval of interest T C Tu, the activ¬ 
ity trend of u w.r.t. T corresponds to the subsequence 
ariu) of a{u) that fits T. It is also useful to define the 
average activity of u over T, denoted by Wr{u), as the 
average of the d values within ariu). 

Freshness and activity trend of interaction. The no¬ 
tions of freshness and activity trend for individual users 
are here extended to model the interaction of any two 
users u,v ai a, given time t, which corresponds to the 
directed edge {u,v) (also here denoted as m —>■ u) in 
the snapshot graph containing t. The rationale here is 
that the more recent is an interaction (or the more in¬ 
creasing is its activity trend), the stronger should be 
the relation u —>■ u. Recall that (u, v) means that v is 
consuming information at time t produced by u. 

Let us denote with = {pi,...,pk} the set of 
information-production actions of u, and with Tu{Pu) = 
{tp ^,..., } the associated timings. Moreover, let Cu^v 

be a set of triplets (tp., ), such that tp^ G Tu{Pu), 

and tcj, Xcj denote the time and the frequency, respec¬ 
tively, at which v consumed the u’s post pi. Note that 
we have used subscripts p and c to mean “production” 
and “consumption”, respectively. 

According to the above formalism, we define the 
freshness of interaction u ^ v w.r.t. T as the maxi¬ 
mum freshness over the sequence of pairs (production¬ 
time, consumption-time) in T: 

friu.v) = 

s.t. -) € ^ts"^ tp.j tc ^ ( 8 ) 

Analogously, we define the activity trend of interac¬ 
tion, based on the DSA model previously used to de¬ 
fine the activity trend of user. To this end, given the 
interaction u ^ v, we consider the time series Su,v rep¬ 
resenting pairs {x,t), where x denotes the number of 
actions at time t performed by v in response to a spe¬ 
cific post by u. Then, we compute the activity trend of 
interaction u ^ v, denoted with aT{u,v), as the result 
of the application of DSA to the time series Su,v The 
definition of aT{u,v) is analogous to driu). 

2.2.2 The time-static LurkerRank algorithm 

Our first formulation of time-aware LurkerRank is based 
on a time-static graph model, which contains one sin¬ 
gle snapshot of the network. Our key idea is to capi¬ 
talize on the previously proposed functions of freshness 
and activity to define a time-aware weighting scheme 
that determines both the strength of the productivity 
of a user and the strength of the interaction between 
any two users linked at a given time. To this purpose. 


a{u) = [{di,ti),..., {ah,th)] 
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we introduce two real-valued, non-negative coefficients 
ujf,uja to control the importance of the freshness and 
the activity trend in the weighting scheme. 

Given a temporal interval of interest T, and coeffi¬ 
cients ujf,uja, we define the function wt{-) in terms of 
the user freshness and average activity calculated for 
any user v G V: 


wriv) 


< /t(u), if/r(i;) ^ 0,a^(u) = 0 

[l, otherwise 

(9) 


By default, the two coefficients are set uniformly as 
tof = LOa = 0.5. If is contained into T (i.e., /t('c) ^ 0) 
and the average activity is zero, the wt value will co¬ 
incide to the freshness value, which is strictly positive; 
otherwise, if /t('c) = 0, the wt value will equal one. It 
should be noted that wt will hence be 1 if either the 
freshness and average activity are maximum or T is not 
relevant to the timespan over which the user has been 
active: this is admissible in our theory since we want 
to exploit information about the user activity only if 
this is available in a given time interval. The rationale 
behind the wt value assigned to a vertex v is to add 
a multiplicative factor that is inversely (resp. directly) 
proportional, otherwise neutral, to the size of the in¬ 
neighborhood in(v) (resp. size of the out-neighborhood 
out{v)) in the formulation of our time-static Lurker- 
Rank algorithm. 

Analogously to wt{-), we define the function •) 
in terms of the freshness and average activity of inter¬ 
action calculated for any u,v GV such that {u,v) G £, 
as follows: 


wt(u, v) 


UJf-\-UJa 


< fT{u,v), 

A 


if fT{u,v) ^ 0, 
oriu, v) 0 
if friu^v) ^ 0, 
Wr(u,v) = 0 
otherwise 

( 10 ) 


Compared to Eq. note that the expression in Eq. ( |l0| 
holds zero if the freshness is zero. This will be clear as 
we will show the use of wt{', •) in an exponentially neg¬ 
ative smoothing term that is present in the definition 
of our time-static LurkerRank algorithm. 

We are now ready to provide our formulation of 
the time-static LurkerRank algorithm, hereinafter de¬ 
noted as Ts-LR, which involves both functions wt{-) 
and wt{‘, •) above defined. Time-static LurkerRank shares 
with the basic LurkerRank formulation the way the 


in-neighbors-driven lurking term is combined with the 
out-neighbors-driven lurking term, that is, for any user 
V gV and temporal interval of interest T: 

Ts-LRt{v) = d[CAv) (l+£out(n))] + (l-d)/(|V|) ( 11 ) 

However, the in-neighbors-driven lurking function £in(u) 
is now defined asQ 

i: (12) 

iniu) 

and the out-neighbors-driven lurking function £out(^) 
as: 




in{v) 




exp - E w{v, 'I 

\ uGRjj 


^ ^Ts-LRt{u) ( 13 ) 


uGitt, 


it(u) 


2.2.3 The time-evolving LurkerRank algorithm 

The time-static LurkerRank can work only on a sub¬ 
set of relational data that are restricted to a particu¬ 
lar subinterval of the network timespan. Therefore, in¬ 
formation on the sequence of events concerning users’ 
(re) actions is lost as relations are aggregated into a sin¬ 
gle snapshot. To overcome this issue, we define here an 
alternative formulation of time-aware LurkerRank that 
is able to model, for each user v, the potential accumu¬ 
lated over a time-window of the contribution that each 
in-neighbor u had to the computation of the lurking 
score of v. 


Cumulative freshness and activity functions. We be¬ 
gin with the definition of a cumulative scoring function 
which forms the basis for each of the subsequent func¬ 
tions that will apply to the previously defined freshness 
and average activity at user and interaction level. In¬ 
tuitively, this cumulative scoring function (g<) should 
be defined at any time t GT to aggregate all values of 
a function g (defined in T) computed at times t' G T 


^ Note that, for the sake of simplicity, we have omitted the 
subscript T in the freshness and activity trend functions, in 
the weighting function as well as in the in and out functions, 
since the reference interval of interest T is assumed clear from 
the context. Analogously, we override the function symbols 
£in(u) and £out(u) given in Eq. 0 > since they will be never 
referenced out of the Ts-LR setting. 
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less than or equal to t, following an exponential-decay 
model: 

g<(t)cxg(t) + ^(l-2‘'-‘)g(0 (14) 

t'<t 

Let the timespan T of the network graph be parti¬ 
tioned in consecutive sub-intervals Ti, T 2 ,..., T^,... = 
[toj^i], (^ij^ 2 ]) ■ • ■) (ti-ijti] ■ ■ The generic cumulative 
scoring function g<(-) has a straightforward translation 
in terms of user-freshness: if U corresponds to the end- 
time of the span of interest whose latest sub-interval 
is Ti, we define the cumulative user-freshness function 
applied to any user u to integrate (with exponential 
decay) all user-freshness values individually obtained 
at each sub-interval preceding ti'. 

c/r,(u) = /t.(w) + 2*'‘"‘‘)/Tfc(M) (15) 

tk<ti 

Our cumulative user-activity function, we denote with 
cuTiO, has similar form. Formally, for every u S V, we 
have: 


caTi(u) = QTiiu) + X! *')0'Tk{u) 


(16) 


tk<ti 


The definition of cumulative freshness of interac¬ 
tion, cfTi{u,v), and cumulative activity of interaction, 
caTi{u,v), at each T^, and for every {u,v) G £, follow 


intuitions analogous to Eq. (151 and Eq. (16), respec¬ 
tively. 

The values yielded by the above defined four func¬ 
tions of cumulative freshness and activity, at user as 
well as at interaction level, are then normalized and 
multiplied by the corresponding information in the tran¬ 
sient model, that is, for every u GV: 


c/t 


cM (“) 


maxj c/t, (m) 


carp'(u) = 


caTi (u) 
maxj COT, (u) 


■hi (u) 

(17) 

•OT, (u) 

(18) 


The user-interaction function counterparts have anal¬ 
ogous form to Eq. 0 and Eq. ( |I^ . Our motivation 
for adopting a (multiplicative) combination of a nor¬ 
malized cumulative freshness/activity function with a 
transient freshness/activity function, is that we want 
to ensure that the freshness/activity information cu¬ 
mulated through times preceding a target time Ti will 
be valued w.r.t. the actual contribution (in terms of 
freshness/activity) that the user provides in the OSN 
at given time Ti. 

The time-evolving LurkerRank algorithm, hereinafter 
denoted as Te-LR, follows a formula that is analogous to 


the Ts-LR. However, Te-LR adopts new weighting func¬ 
tions, we denote as cwt(-) and cwt{-, •); which have the 
following properties: 


they have analytical form that is identical to wt{‘)i 
given in Eq. (§, and given in Eq. ( [To| ), re¬ 

spectively; 

they are defined, at user level, in terms of the func¬ 
tions cfrp'{-) and carp'(-) (given in Eq. (17) and 
Eq. ([I^), and at user-interaction level, in terms of 
the functions cfj<'{-, ■) and cay'(-, •). 


3 Temporal Analysis of Lurkers 

We present here our multi-faceted, temporal analysis 
of lurkers along the time dimension. In Section |3.H 
we frame seven research questions, which span differ¬ 
ent problems of interest to our study. We describe the 
data that will be used for our evaluation in Section [3. 2 1 
Then, Sections |3.3H3.9| will contain our answer to each 
of the stated questions. 


3.1 Outline of research questions 

Our study of lurkers and lurking behaviors across time 
is built upon seven research questions, which are stated 
as follows. 

Ql: Do lurkers match zero-contributors? Definitions 
of lurking are often related to nonposting behavior. Our 
first research question is aimed at gaining insights into 
the correspondence between inactive users and lurkers 
over time. Inactive users are here intended as “zero- 
contributors”, i.e., users who have never posted or pro¬ 
vided a comment/favorite-mark. 

Q2: Do lurkers match newcomers ? Lurking can de¬ 
pend on a temporary status of learning the etiquette of 
the community and the proper usage of the services pro¬ 
vided by an OSN. This also relates to newcomers, i.e., 
users that have started to participate in some activity. 
Therefore, analogously to our first research question, 
we also analyzed the relation between newcomers and 
lurkers over time. 

Q3: Do lurkers create preferential relations with ac¬ 
tive users? In the third research question, our goal is 
to unveil the dynamics of the binding between lurkers 
and active users, and how this relates to the popularity 
of the active users. 

Q4: How frequently do lurkers respond to the others’ 
actions? Lurkers can show a limited amount of activity 
in response to others’ contributions to the community 
life. We are interested in measuring the distribution of 
time latency that occurs to observe repeated actions by 
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a user in response to his/her followees (i.e., comments, 
or favorite-marks). 

Q5: How do lurking trends evolve? Our fifth re¬ 
search question focuses on how lurking trends change 
over time, how they can be grouped together, and wheth¬ 
er characteristic patterns may arise to indicate different 
profiles of lurkers. 

Q6: How do topical interests of lurkers evolve? We 
are also interested in exploring topic-sensitive evolution 
patterns of lurking behavior. We analyze the topical us¬ 
age of lurkers, how lurkers change their topical patterns, 
and whether these changes might differ from those of 
the other users. 

Q7: Can time-aware models improve the ranking of 
lurkers? In our final research question, we investigate 
the impact of using our proposed time-transient and 
time-cumulative ranking models on the quality of lurker 
ranking solutions. We provide a quantitative analysis of 
results obtained by our developed time-aware Lurker- 
Rank algorithms, with respect to a data-driven evalu¬ 
ation of the ranking performance. We also offer a com¬ 
parison with a state-of-the-art time-aware ranking al¬ 
gorithm [7]. 


3.2 Data 

We used data from Flickr, FriendFeed and Instagram 
networks to conduct our analysis. A major motivation 
underlying our data selection is that we wanted to use 
datasets that have been previously studied in research 
and that contain timestamped information on the ac¬ 
tivities of users and their relationships, including fel¬ 
lowships, comments, or like/favorite-markings. Flickr 
dataset was originally collected in 2006-2007 and used 
in min], FriendFeed refers to the latest (2010) version 
of the dataset studied in m, while Instagram is our 
dump recently crawled in 2014, whose user interaction 
network was used in m- Flickr and FriendFeed have 
also been selected to be consistent with our previous 
analysis of lurkers [S5] . We refer the reader to the orig¬ 
inal works that used Flickr and FriendFeed, and to our 
submission support page available at http://uweb. dimes. 
unical.it/tagarelli/timelr/ for the description of the In¬ 
stagram dump. 

Note that our selected datasets are rather heteroge¬ 
neous in terms of features concerning user relationships: 
Flickr contains timestamps of 34.7M favorite markings 
assigned to the uploaded photos, and also contains (in¬ 
ferred) timings on the user subscriptions. In Instagram, 
every link between v (follower) and u (followee) is an¬ 
notated with the number and timestamp of the u’s 
comments to media posted by u (about 2M comments 


Table 1 Main structural characteristics of the evaluation net¬ 
work datasets. 


data 

# nodes 

# links 

avg 

in-deg. 

avg 

path len. 

dust. 

coef. 

assorta- 

-tivity 

averages over time-varying snapshot graphs 

Fhckr-social 

1 , 889,102 

25 , 265,343 

13.25 

4.41 

0.108 

0.009 

Flickr 

215,429 

1 , 483,462 

6.85 

4.69 

0.025 

- 0.013 

FriendFeed 

6,962 

64,509 

5.15 

5.89 

0.071 

- 0.043 

Instagram 

10,353 

31,215 

2.94 

5.83 

0.083 

0.217 

full (static) social graphs 

Flickr 

2 , 302,925 

33 , 140,018 

14.39 

4.36 

0.107 

0.015 

FriendFeed 

493,019 

19 , 153,367 

38.85 

3.82 

0.029 

- 0.128 

Instagram 

54,018 

963,833 

17.85 

4.50 

0.048 

- 0.067 


and 1.7M likes). Analogous to Instagram is the situa¬ 
tion in FriendFeed but for information concerning likes 
(«230K) and comments (>687K) to posts. 

Table [2 summarizes main structural characteristics 
of the network datasets we used in our evaluation. The 
table is organized in two subtables. The upper subtable 
contains statistics on timestamped snapshot graphs av¬ 
eraged over the network-specific timespan, which was 
binned at month level; more precisely, in order to have 
uniformly-sized snapshots, we aggregated them on a 28- 
days (i.e., 4 weeks) basis. Note also that all snapshots 
refer to interaction subgraphs except the first row in the 
table which corresponds to the timestamped fellowship 
(social) subgraphs of Flickr. The timespans covered by 
the datasets are 7 months for Flickr (2006/11/02 - 
2007/05/17 for the fellowship subgraphs and 2006/09/08 
- 2007/03/22 for the interaction subgraphs), 7 months 
for FriendFeed (2010/04/09 - 2010/09/30), and 20 months 
for Instagram (2012/06/28 - 2013/12/18). The lower 
subtable contains statistics on the full (i.e., static) so¬ 
cial graphs of Flickr, FriendFeed and Instagram. 

3.3 Lurkers vs. inactive users 

Our first question (Ql) focuses on the relation between 
lurkers and inactive users, also referred to as zero-con¬ 
tributors. 

To answer this question, we initially analyzed how 
much the set of zero-contributors overlaps with the set 
of users having an in/out-degree ratio higher than one, 
here dubbed “potential lurkers”. When considering the 
static picture of a network dataset, one remark is that 
the set overlap between zero-contributors and potential 
lurkers may vary from 12% (favorite-based interaction 
network in Flickr) to 72% and 95% (comment-based 
interaction networks in FriendFeed and Instagram, re¬ 
spectively). Moreover, since the relative difference in 
size of the two sets can vary from one dataset to an¬ 
other, we also computed the overlap ratio w.r.t. the 
set of potential lurkers, which was found to be 57% 
on Flickr, 62% on Instagram, and 96% on FriendFeed. 
There are hence clues that the overlap (or overlap ratio) 
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Fig. 2 Overlap ratio of newcomers and top-ranked lurkers in monthly snapshots. 



weeks 

Fig. 1 Overlap ratio of zero-contributors against potential 
lurkers and top-ranked lurkers: distributions over weekly 
snapshots of the Flickr network. The inset shows the weekly 
distributions of zero-contributors and potential lurkers. 

would be relatively smaller when favorite/like interac¬ 
tions are taken into account, that is, potential lurkers 
are more likely to behave similarly to inactive users 
when activity is regarded in terms of commenting. 

We further investigated how the relation between in¬ 
active and lurking users evolves over time. In this anal¬ 
ysis, we also included the set of top-ranked users ob¬ 
tained by our LurkerRank. Figurej^shows the temporal 
trends of overlap ratios w.r.t. potential lurkers, top-5% 
and top-25% ranked lurkers, on Flickr. Interestingly, 
the overlap ratios remain rather unaffected over time, 
despite the jump in frequency at the 14-th week (dis¬ 
played in the inset). The distribution of top-5% ranked 
lurkers is always above the other two series (up to 0.15), 
which in turn roughly match. Note that in the inset, the 
distributions of potential lurkers and zero-contributors 


actually follow close trends, although they are scaled 
differently (on one order of magnitude). 

3.4 Lurkers vs. newcomers 

Similarly to the previous analysis, in our second re¬ 
search question (Q2), we investigated whether and to 
what extent lurkers and newcomers can overlap at any 
given time in our evaluation networks. 

To this end, we assumed that a user is regarded 
as a newcomer at time t if, at any time t' < t, s/he 
was not involved in any interaction with other users, 
while lurkers were identified at each time t. Figure 
shows, for each dataset and relating top-LurkerRank 
solutions at 5%, 10% and 25%, two series over a six- 
month timespan: the fraction of newcomers that were 
recognized as lurkers (solid lines) and the fraction of 
lurkers that were also newcomers (dashed lines). 

In the case of interactions as Flickr favorite-markings 
actions, shown in Fig.j^a), we observe that the fraction 
of lurkers matching newcomers varies from about 30% 
down to 20%, following the same trend over the times¬ 
pan regardless of the top-% selected from the Lurker¬ 
Rank solution; by contrast, the trend of the fraction 
of newcomers matching lurkers is more constant (and 
slightly increasing) over the timespan, achieving values 
below 10%, for top-5% and top-10% lurkers, and around 
20% for top-25% lurkers. 

Considering a comment-based interaction scenario, 
shown in Fig. [^b)-(c), again the trend of the fraction 
of newcomers matching lurkers looks roughly constant 
over time. As concerns the fraction of lurkers match¬ 
ing newcomers, it is within 50-20% in FriendFeed, but 
below 10% on average in Instagram. 

We tend to believe that the difference in matching 
(between lurkers and newcomers) among the various 
scenarios might be explained due to inherent charac¬ 
teristics of an OSN, rather than to the type of interac- 
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(a) FUckr (b) FriendFeed (c) Fhckr (d) FriendFeed 

Fig. 3 Distribution of active users as a function of the lurkers-followers, (a) and (b), and distribution of lurkers as a function 
of the active users-followees, (c) and (d). 


tion (as previously observed in the evaluation of inac¬ 
tive users versus lurkers). Nevertheless, we would like 
to point out that our research objective in comparing 
lurkers with newcomers is consistent with previous re¬ 
search focused on the analysis of newcomers’ behavior 
in an OSN: in fact, as found by Burke et al. [12], new¬ 
comers’ behavior can be explained by examining how 
they tend to be engaged in content production activi¬ 
ties by observing their friends’ actions. This is nothing 
less than a form of Bandura’s observational learning |5|, 
i.e., learning through being given access to the learning 
experiences of other users; as widely studied in social 
science and human-computer interaction, observational 
learning and lurking are related to each other EiElj. 


3.5 Preferential attachment 

Our third research question (Q3) focuses on under¬ 
standing whether relations between lurkers and the ac¬ 
tive users they are linked to can be explained in terms 
of power law and preferential attachment. To this pur¬ 
pose, we selected the set of lurkers and the set of active 
users respectively from the top and the bottom of the 
LurkerRank solution. 

We first investigated whether the probability of ob¬ 
serving active users with a certain degree of attached 
lurkers, and vice versa, can be predicted by a power-law. 
Figure]^ shows the distribution of lurkers as a function 
of the degree of attached active users, and also for the 
distribution of active users, obtained on the FUckr and 
FriendFeed fellowship graphs, using the top-25% and 
bottom-25% of the LurkerRank solution. We computed 
the best fit of a power-law distribution to the observed 
data, and assessed the statistical significance of the fit¬ 
ting by a Kolmogorov-Smirnov test. From the figure it 
can be noted that the plots follow a power-law behav¬ 
ior. The exponents of the fitted power-law distributions 
are 1.725 (Xmin = 1) and 1.363 {xmin = 1) for FUckr 


(Fig.j^a) and (c), resp.), 2.015 {Xmin = 315) and 1.679 
{xmin = 99) for FriendFeed (Fig. ib) and (d), resp.). 
In all cases, the power-law fitting is statistically sig¬ 
nificant, with Kolmogorov-Smirnov test statistic (resp. 
p-value) of 0.0236 (resp. 0.8006) for Fig. |^a), 0.0396 
(resp. 0.7662) for Fig. |^c), 0.0516 (resp. 0.9946) for 
Fig. [^b), and 0.0546 (resp. 0.9161) for Fig.|^d). 

Our main goal to answer question Q3 is to try ex¬ 
plaining the relation between lurkers and active users in 
terms of preferential attachment, that is, we hypothe¬ 
size that lurking connections are attached preferentially 
to active users that already have a large number of con¬ 
nected lurkers. Following the lead of [12], we studied two 
separate cases of “attachment”, which differently rely 
on a user’s in-neighborhood or out-neighborhood. How¬ 
ever, in our context, such a type of analysis becomes 
more complicated since nodes (and their neighborhood) 
must be selected according to their different status as 
either lurker or active user. Intuitively, two cases of 
preferential attachment can be considered, namely: new 
connections received by active users for any k lurkers, 
and new connections produced by lurkers for any k ac¬ 
tive users. 

We initially investigated the two cases of preferen¬ 
tial attachment according to the timestamped follow- 
ship information in a network. Figure shows results 
obtained on FUckr, averaged per user and per week, 
for each k. It can be noted that the number of lurkers 
shows a good linear correlation with the average num¬ 
ber of new links received by active users (left-hand side 
of Fig. 1^: the least-squared-error linear fit has a slope 
of 0.00836, which means that on average active users 
receive per week one new connection from lurkers for 
every 120 lurker-followers that they already have. By 
contrast, given a correlation of -0.11, no linear trend 
exists when studying the new connections produced by 
lurkers for any k active users (right-hand side of Fig. |^. 
Therefore, it is unlikely that lurkers following a high 
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Fig. 4 Timestamped followship-based evaluation of prefer¬ 
ential attachment between lurkers vs. active users. New con¬ 
nections are detected for each weekly-aggregated network, on 
Flickr. 



Fig. 5 Timestamped interaction-based evaluation of prefer¬ 
ential attachment between lurkers vs. active users. New con¬ 
nections are detected for each weekly-aggregated network. 


time difference (days) 

(a) Flickr 
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time difference (days) 


(b) Instagram 

Fig. 6 Responsiveness frequency: empirical cumulative dis¬ 
tribution function (ecdf) plots of user reaction latency (in 
days), based on favorites in Flickr and comments in Insta¬ 
gram. (Best viewed in color.) 


number of active users will create new connections to¬ 
wards other active users. 

We further explored the preferential attachment eval¬ 
uation between lurkers and active users focusing on 
timestamped interaction information in a network. Specif¬ 
ically, we considered user interaction based on likes or 
comments in Instagram, and on comments in Friend- 
Feed. Figurej^shows results obtained on the two datasets 
that concern the correlation between the number of 
lurkers (k) and the average number of new links re¬ 
ceived by active users for any given fc, on a weekly ba¬ 
sis. Correlation is moderate (0.34 on Instagram, 0.52 on 
FriendFeed), while, in terms of least-squared-error lin¬ 
ear fit, the two distributions have a slope of 0.00570 {In¬ 
stagram) and 0.06585 (FriendFeed), which correspond 
to having one new interaction (i.e., posted comment) 
from lurkers per active user and week for every 176 
and 15, respectively, lurkers that have already inter¬ 
acted. However, compared to the analogous situation 
on weekly-aggregated Flickr fellowship networks in the 
left-hand side of Fig. both the distributions in Fig. 
have lower correlation and also lower size, which can be 
explained since temporal information about user inter¬ 
actions (i.e., likes/comments) in both networks is rel¬ 


atively sparse with respect to that about fellowships. 
In effect, by aggregating interaction relationships on 
a monthly basis, correlation increases (0.47 on Insta¬ 
gram, 0.66 on FriendFeed), along with a decrease of 
the amount of preferential attachment (one new interac¬ 
tion from lurkers per active user and month for every 51 
and 4 interacting lurkers on Instagram and FriendFeed, 
respectively). Finally, concerning new connections pro¬ 
duced by lurkers for any k active users, we observed 
very sparse distributions with null correlation, on both 
datasets and regardless of the temporal grain of the 
aggregated networks. This means that lurkers, which 
have a higher number of active users as recipients of 
their likes/comments, are not more prone to have new 
interactions with other active users. 

3.6 Responsiveness 

Concerning question Q4, we aim at estimating how fre¬ 
quently lurkers react to the postings of other users. 

We examined the distribution of time differences (in 
days) between any two consecutive responsive actions 
made by a user w.r.t. a post created by her/his fol- 
lowees. Figure shows the empirical cumulative distri- 
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(c) Instagram 

Fig. 7 Clustering of the time series representing Lurker- 
Rank scores in time-evolving graph networks: (a) FriendFeed 
daily snapshots built on like-|-comment relations, (b) Flickr 
weekly snapshots built on favorite relations, and (c) Insta- 
gram monthly snapshots built on comment relations. Warmer 
colors correspond to series with higher cluster-membership. 
(Best viewed in color.) 


bution functions over the first 90 days, for comments 
on Instagram and for favorites on Flickr. Each of the 
plots in the figure compares the distributions obtained 
for top-5% and top-25% lurkers with the distribution 
corresponding to all users in the network. 

We observe that the lurkers’ responsiveness gener¬ 
ally takes several days, or weeks, although the latency 
between any two consecutive responsive actions may 
significantly vary in the two networks. Focusing on the 
80% of responses (i.e., 0.8 on the y-axis of the plots), 
the latency is up to 18 days in Flickr, with no evident 
difference regarding the fraction of top-ranked lurkers 
considered; by contrast, in Instagram, the top-25% lurk¬ 
ers have an average responsiveness of more than three 
weeks, which takes even longer (40 days) in the case of 
top-5% lurkers. Moreover, compared with the respon¬ 
siveness of all users, lurkers tend to react more slowly, 
up to 20 days more in Instagram; however, the gap with 
respect to all users is only of few days in both net¬ 
works when the fraction of top-ranked lurkers is large 
(25%). Thus, more time-consuming responsive actions, 
like comments in Instagram, would explain not only the 
increase in the the lurkers’ responsiveness but also the 
relative difference with the generic case of all users. 


3.7 Temporal trends and clustering 

In our fifth research question (Q5), we analyze how 
lurking trends evolve, focusing on unveiling the struc¬ 
tures hidden in such evolving trends. 

We pursued this goal as a task of clustering of time 
series representing the users’ lurking profiles. The basis 
for this clustering analysis lies in repeatedly applying 
our LurkerRank to successive snapshots of a network 
dataset. Since the snapshots can vary in size, Lurker¬ 
Rank scores were first normalized to be comparable 
across different times. We then generated a time se¬ 
ries of the normalized LurkerRank scores for every user 
in the dataset. The resulting set of time series was the 
input for our clustering task. 

We adopted a soft clustering approach to group the 
time series of LurkerRank scores. This implies that a 
time series is allowed to obtain fuzzy memberships to 
all clusters. Our choice is motivated by suspicion that 
the natural clusters to be detected in this kind of time- 
course data could not be well-separated, rather they 
could be frequently overlap. A suitable method to de¬ 
tect clusters in this kind of data is based on fuzzy c- 
means clustering. We used a particularly efficient im¬ 
plementation, provided by the Mfuzz R-package tool|^ 
based on minimization of the weighted square error 


^ http://www.bioconductor.org/packages/release/bioc/html/Mfuzz.html. 
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Table 2 Summary of LDA-learned topics in our Instagram dataset. 


LDA topic ids 

topic-set 

label 

main descriptors (i.e., media tags) of topic-set 

subnetwork- 
induced size 

0, 6, 10 

nature 

sky, sunset, whpfiowerpower, whpsignsoftheseason, clouds, nature, landscape, 
sea, beach, flowers, water, trees, hinking, summer, fall, autumn 

8,185 

12. 14 

architecture 

whpstraightfacades, architecture, building, instaworld'shots, streetphotography, 

Spain, madrid, paris, france, london, sicily, design, arquitectura, youmustsee 

2,884 

13 

fun 

love, me, swag, lol, fun, like, awesome, cool, happy, food 

1,314 

16 

pets 

whppetportraits, cats, caturday, catstagram, dog, cute, pets, kitty, catsofinstagram, petsofinstagram 

3,124 

19 

video 

whpmovingphotos, whpreplacemyface, whpbigreveal, whpfilmedfromabove, instavideo, 
video, whpmovingportrait, movies, videogram, instagramvideo 

3,062 

1, 2, 7 

miscellanea 

whpthroughthetrees, ig'captures, whpmyhometown, whpliquidlandscape, whpemptyspaces, whpmotherlylove, 
whpthanksdad, whpstraightfacades, whpmyfavoriteplace, whpfirstphotoredo, whpstrideby 

16,573 

8, 18 

travel 

worldunion, whpmyfavoriteplaice, travel, world shotz, worldcaptures, worldplaces, igworldclub 

1,200 

3, 4, 5, 17, 11 

attention-seeking 

instagood, instamood, photooftheday, pleasecomment, pleaseshoutout, teamfollowback, 
igers, picoftheday, instadaily, bestoftheday, webstagram, iphonesia, igdaily 

5,794 

9, 15 

photo art 

whpsilhouettes, whpselfportrait, whplookingup, whpreflectagram, selfie, 
blackandwhite, whpbehindthelens, whpstilllife, silhouette, bnw, monochrome 

11,882 


function. Note that since the clustering is performed 
in Euclidean space, the time series were standardized 
to have a mean value of zero and a standard devia¬ 
tion of one. This preprocessing step ensures that series 
with similar variations are close in Euclidean space. 
As concerns the setting of the fuzzifier and the num¬ 
ber of clusters required by the clustering algorithm, 
we follow the methodology suggested in [^, and sum¬ 
marized in our submission support page available at 
http://uweb.dimes.unical.it/tagarelli/timelr/. 

Eigurej^ shows some of the clustering results we ob¬ 
tained on the evaluation datasets. Eor this analysis, we 
initially selected the top-25% lurkers of the snapshot 
at time zero, then kept only those users appearing in 
at least 50% of the subsequent snapshots. Results cor¬ 
respond to different scenarios, both in terms of time- 
granularity (which impacts on the time series length) 
and type of relation (i.e., comments, favorite-marks, 
likes plus comments) underlying the graphs from which 
the time series were generated. Note that the member¬ 
ship values of time series are color-encoded in the plots, 
which facilitates the identification of temporal patterns 
in the clusters. 

It can be noted from the figure that some cases are 
characterized by quite evident trends. Eor instance, on 
Flickr, cluster:;(^2 groups lurkers whose behavior (lurk¬ 
ing scores) evolves in the form of a series with an initial 
plateau followed by an increasing ramp and then a de¬ 
creasing ramp, finally by a new stagnation trend. Sim¬ 
ilar is the situation depicted by clustered on Flickr. 
On FriendFeed, clusters#l-#3-#4 present a more or 
less marked period of roughly constant lurking behav¬ 
ior between the 24th and 36th weeks, along with various 
peaks in the heads or tails of the series, which would 
hint at particularly critical (passive) periods of lurk¬ 
ing. In general, more time-consuming actions (i.e., com¬ 
ments on Instagram, like-|-comments on FriendFeed) 
tend to correspond to trends that present sharper up¬ 
ward/downward shifts, and to clusters with more noisy 


data. Einally, note that except for cluster^j^^:! on Flickr, 
lurking series do not tend to group into decreasing trends, 
which would suggest that lurkers are not likely to spon¬ 
taneously “de-lurk” themselves, i.e., to turn their be¬ 
havior into a more active participation to the commu¬ 
nity life. 

3.8 Topical evolution 

Our six research question (Q6) concerns the analysis 
of topic-sensitive evolution patterns of lurking behav¬ 
ior. This involves a characterization of the topical us¬ 
age of lurkers, of how their topical patterns evolve and 
whether these may differ from those of the other users. 

To answer this question, we employed a statistical 
topic model to learn the topics of interest exhibited by 
the users. More specifically, we used an efficient im¬ 
plementation provided in the gensim librarjj^ of the 
well-known Latent Dirichlet Allocation (LDA) [TU] . Eor 
the sake of brevity, we focus here on the presentation 
of results that we obtained on the Instagram dataset, 
for which we regarded all media of a user as a sin¬ 
gle document and the tags assigned by users to their 
media as document features. We filtered out tags oc¬ 
curring in less than five documents or in more than 
75% of the documents in the collection. We tested our 
topic model with 5 to 50 latent topics, in increments 
of 5, executing up to 100 iterations; upon a manual 
inspection of the description of topics learned by the 
LDA models, we adopted the model with 20 topics 
as the most “interpretable” one. Our decision was in¬ 
deed taken based on obtaining a topic description as 
sharp and rich as possible in terms of both character¬ 
istic and discriminating features. We remark that the 
topics extracted by our selected LDA model are con¬ 
sistent with a previous study on topical interests that 
was performed on a similar dump of the Instagram 

® http://radimrehurek.com/gensim/. 
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Fig. 8 Overlap between the top-ranked lurkers detected on the snapshot graph and the top-ranked lurkers detected on each 
of the topic-specific subgraphs of the snapshot, at top-5%, 20%, and 25%, over the quarters of year 2013 in Instagram (first 
quarter on the left, last quarter on the right). 


media dataset [53]. That study showed how the most 
popular tags in Instagram concern a limited number 
of categories, or coarse-grain topics, which include: na¬ 
ture, travels, photography-related technical aspects, us¬ 
age of popular applications for photo/video editing and 
publishing (e.g., Latergram, VSCO Cam), attention¬ 
seeking and microcommunity-focused tags (e.g., #pho- 
tooftheday, #igmaster, #justgoshoot, #iphonesia). 

We used the learned 20-topic LDA model to induce 
topic-sensitive subgraphs from the Instagram user net¬ 
work. To derive each of these subgraphs, we first ag¬ 
gregated the finer-grain topics learned by LDA into 
thematically-cohesive topic-sets, then every user was 
assigned to the topic-set that maximizes the likelihood 
in the LDA per-document topic distributions. Table 
shows a (partial) description of the topics learned by 
LDA, along with the chosen labels for the derived topic- 
sets and the impact on the size (number of nodes) of 
the induced topic-specific subgraphs. Note that we also 
include the miscellanea topic-set which covered all user 
documents whose LDA topic distributions were char¬ 
acterized by a quite high topical entropy—again, this 
is in accord with our study in [25] . which highlighted 
that most users adopt few tags to annotate their media, 
but also that popular users have higher topical entropy 
values (i.e., topic specialization is not relevant). 

Upon the extraction of topic-specihc subgraphs, we 
looked for clues about major topics (i.e., frequently used 
tags) that characterize lurkers. To do this, we com¬ 
pared the top-ranked lurkers detected in the full, topic- 
independent graph and the top-ranked lurkers detected 
in each of the topic-specific subgraphs, for a given frac¬ 
tion of top-ranked lurkers (varying at 5%, 10% and 
25%). More precisely, for each topic, we computed an 
overlap score as the intersection between the set of top- 
ranked lurkers in the topic-specihc subgraph and the 
set of top-ranked lurkers in the full graph, divided by 
the sum of intersection values obtained over all top¬ 
ics. Results (not shown) put in evidence a relatively 


good matching between the top-ranked lurkers in the 
full graph and those relating to the subgraph specihc 
of the photo art topics (overlap ranging from 0.37 at 
top-5% to 0.25 at top-25%), followed by nature (over¬ 
lap around 0.13-0.14) and attention-seeking tag topics 
(overlap of 0.13-0.10). Other tags specihc to any other 
topic-set in Table correspond to low overlaps (below 
0.05), with the exception of miscellanea whose corre¬ 
sponding overlaps vary from 0.28 to 0.38 by increasing 
the size of top-ranked lurkers under consideration. More 
interestingly, we repeated the above evaluation over se¬ 
lected temporal snapshots of the Instagram network. 
Figure shows results obtained over the quarters of 
year 2013, which corresponds to the timespan that cov¬ 
ers most user actions and interactions in our Instagram 
dataset. As can be seen from the plots in the hgure, 
the topic usage behavior of lurkers in each snapshot is 
mainly characterized by tags that belong to one or more 
topic-sets; particularly, photo art in the third and sec¬ 
ond quarter, nature in the last quarter but also in the 
other ones, pets in the second quarter. It is interesting 
to observe that, with the exception of the first quarter 
snapshot, miscellanea tags are not a frequent choice of 
lurkers, i.e., lurkers are more likely to focus on contents 
(media) that are well categorized into only one of the 
identified topic-sets. 

We further analyzed the evolution of topic interests 
over time. In this regard, we hypothesized that lurk¬ 
ers might exhibit patterns of topical interests that do 
not significantly differ from those of the other (active) 
users. Figurej^shows two transition diagrams which of¬ 
fer a view of how the topical usage patterns change from 
one state (i.e., topic-set) to another, over the quarters 
of year 2013 in Instagram, for all users as well as for all 
top-25% lurkers. Let us first consider the topical evolu¬ 
tion of all users (top of Fig. |^. Here we observe that the 
various levels (i.e., quarters of year 2013) are character¬ 
ized by a core of topic-sets which, although with varying 
proportions, are always present over time (i.e., nature. 
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Fig. 9 Topic evolution on Instagram: all users (top) vs. top-25% lurkers (bottom). Levels correspond to quarters of year 2013. 
Each vertical colored box represents a state as an aggregation of topics, which are learned from the network contents at a 
given time (level). Gray curves correspond to users transitioning from state to state. The portion of each state that does not 
have outgoing gray lines are users that end in this state. States are labeled with their description and their frequency, i.e., the 
number of users that are assigned to that topic at that level; gray curves are proportional to the topic level frequencies. (Best 
viewed in color.) 
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attention-seeking, architecture, and miscellanea). Other 
topic-sets (e.g., pets and photo art) may correspond to 
temporary interests of users, as they are present only in 
some of the levels. Topical usage patterns of the users 
tend to continuously change over time. We in fact ob¬ 
serve transitions from one topic-set state in a level to 
each of the other states in the next level. Note that such 
a high dynamicity is not surprising, which is explained 
by the inherent softness of topic categorization underly¬ 
ing the tags used for the uploaded media in Instagram 
and similar OSNs; in other terms, users can often adopt 
tags that naturally belong to more than one topic-set 
to annotate their media, according to the type of photo 
or video (e.g., a skyline photo can be equally relevant to 
the categories photo art, travel, attention-seeking). How¬ 
ever, as it happens at the second level (quarter), all 
topic-set states can also show a moderate stability, since 
a fraction of users (about 20%) do not transition out of 
a topic-set state once they enter it. 


Topical usage transitions in the graph of the top- 
ranked lurkers (bottom of Fig. are also highly dy¬ 
namic. The topic-sets per level are either the same as 
or a subset of those in the all-users graph, showing dif¬ 
ferent relative proportions (i.e., frequency of usage) in 
some cases (e.g., family in the second level, attention¬ 
seeking in the fourth level). This would hence confirm 
our initial hypothesis that lurkers tend to show pat¬ 
terns of topical interests that do not significantly differ 
from the ones of all users. A major difference with the 
all-users graph however is that in some cases more tran¬ 
sitions flow out from a topic-set state than the incom¬ 
ing ones, which corresponds to the behavior of lurkers 
as “newcomers”, i.e., lurkers that were not present in 
the immediately preceding snapshot graph, but could 
be in earlier snapshots (cf. Section 3.4). For instance, 
while several lurkers showing different interests at the 
second level end in the photo art state at the third level, 
a nearly equal proportion of new lurkers start from that 
state, then transition towards different topic-sets. 


3.9 Time-aware ranking of lurkers 

Our final research question (Q7) is devoted to the anal¬ 
ysis of time-aware ranking of lurkers. We assess the pre¬ 
sumed benefits derived from the use of our proposed 
time-transient and time-cumulative ranking models on 
the quality of lurker ranking solutions. In the follow¬ 
ing, we first present our evaluation methodology, then 
we discuss effectiveness results obtained by our time- 
aware LurkerRank methods and competing methods. 


3.9.1 Data-driven evaluation 


Evaluating lurking in OSNs is a hard problem to deal 
with, because of the lack of ground-truth data for lurker 
ranking. In the attempt of simulating a ground-truth 
evaluation, we build on top of our previous studies [551 
I56| . We generate a data-driven ranking (henceforth DD) 
for every network graph and use it to assess the pro¬ 
posed and competing methods. However, in contrast to 
the data-driven rankings defined in [551l56j . here we fo¬ 
cus on the amount of actions and interactions users 
perform over a time interval. Formally, given u S V and 
time interval T, the data-driven ranking score assigned 
to V at time T is computed as: 


T^{v) = 


''T-tiu, v) 


(19) 


where nt{v) denotes the number of actions that v per¬ 
formed at time t to create new contents (e.g., media up¬ 
loads), and nt{u, v) denotes the number of information- 
consumption actions at time t performed by v in re¬ 
sponse to a specific post by u. Given the characteristics 
of our selected datasets, we compute nt (u, v) as the 
number of “favorite” or “like” actions by v in relation 
to a media posted by u. We observe however that, in 
general, an information-consumption action does not 
necessarily imply that the user will produce visible in¬ 
formation such as posting a “like” or “comment” in re¬ 
sponse to another user’s post. Within this view, times- 
tamped information-consumption actions could refer to 
the latent or silent interactions, i.e., the actions of read¬ 
ing or watching produced contents; unfortunately, it is 
not easy to build OSN datasets that are resource-rich 
in terms of latent interactions, mainly due to privacy 
policies and API limitations currently imposed by all 
main OSN services. We will leave the opportunity of 
evaluating the lurker ranking problem focusing on la¬ 


tent interactions as a future work (cf. Section 4.4). 


3.9.2 Competing methods and assessment criteria 

We compared our proposed methods Ts-LR and Te- 
LR against the early (i.e., time-unaware) LurkerRank 
(LR) [561155] . Note that applying LR on interaction graphs 
is here assumed to be consistent with its definition: an 
interaction graph is a subset of the fellowship graph, 
however provided that only visible interactions are taken 
into account, which is indeed our evaluation setting. 

As we presented in Section [2^ our proposed time- 
aware LurkerRank methods are defined upon two mod¬ 
els of temporal graph. Therefore, we carried out Ts-LR 
and Te-LR on different types of graphs, dubbed tran¬ 
sient and cumulative snapshots, respectively. More pre- 
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cisely, we hereinafter refer to a monthly, transient snap¬ 
shot as a snapshot whose timespan is 28 days (cf. Sec¬ 
tion 3.2). We also use the term monthly, cumulative 


snapshot to denote a snapshot covering a time window 
that is one month larger than the previous snapshot; 
moreover, the start time is fixed for all monthly, cu¬ 
mulative snapshots considered on a network dataset, 
thus the size of cumulative snapshots follows a non¬ 
decreasing function. 


We also included in the evaluation the T-Rank al¬ 
gorithm [7] , which is a time-aware adaptation of Page- 
Rank; as we discuss more in detail in Section]^ T-Rank 
was chosen as a competitor since, like our proposed 
methods, it also embeds notions of freshness and activ¬ 
ity. We used the setting of the parameters in T-Rank as 
suggested in [7] , using uniform values for both the types 
of coefficients that control the use of temporal informa¬ 
tion in the time-aware method: the four Wsi coefficients 
used to determine the random jump probabilities, and 
the six wti coefficients that give the transition proba¬ 
bilities of the random surfer. Note also that T-Rank was 
involved only in our transient evaluation case, since the 
algorithm was designed to work in transient snapshots. 
In this regard, we evaluated T-Rank on monthly, tran¬ 
sient snapshots, setting the temporal window of interest 
to the last week of the month, and the tolerance interval 
to the first three weeks. This choice was made in order 
to give more importance to recent temporal information 
(w.r.t. the end-time of each target snapshot). 

To evaluate the ranking performance of the vari¬ 
ous methods, we used two well-known assessment cri¬ 
teria in ranking tasks, namely Kendall-tau rank cor¬ 
relation coefficient [T], and Fayin’s intersection met¬ 
ric |23]. Kendall-tau correlation evaluates the similar¬ 
ity between two rankings, expressed as sets of ordered 
pairs, based on the number of inversions of pairs which 
are needed to transform one ranking into the other: 


2A(1P(L'),1P(L")) 
M{M - 1) 


Above, and L” are the two rankings to be compared, 
M = |L'|= |L"| and A(7^(L'),7^(L")) is the symmetric 
difference distance between the two rankings, calculated 
as number of unshared pairs between the two lists. The 
score returned by r is in the interval [—1,1], where a 
value of 1 means that the two rankings are identical and 
a value of —1 means that one ranking is the reverse of 
the other. Fagin measure allows for determining how 
well two ranking lists are in agreement with each other, 
taking into account top-weightedness and partial rank¬ 
ings. Applied to any two top-fc lists L',L", the Fagin 


Table 3 Kendall-tau correlation and Fagin’s intersection per¬ 
formance w.r.t. DD on monthly, cumulative snapshots of the 
Flickr fellowship network: comparison between Te-LR against 
LR. (Bold values correspond to the best performance per as¬ 
sessment criterion.) 


snapshot 
fend time) 

T 

F@25% 

Te-LR 

LR 

Te-LR 

LR 

2006-11-30 

0.145 

0.021 

0.138 

0.135 

2006-12-28 

0.148 

0.007 

0.142 

0.135 

2007-01-25 

0.159 

-0.005 

0.152 

0.138 

2007-02-22 

0.178 

-0.017 

0.156 

0.135 

2007-03-22 

0.186 

-0.019 

0.162 

0.133 

2007-04-19 

0.186 

-0.017 

0.161 

0.133 

2007-05-17 

0.186 

-0.015 

0.161 

0.133 


score is defined as: 


F(L',L",fc) 




where L.^ denotes the sets of nodes from the 1st to the 
gth position in the ranking. Therefore, F is the average 
over the sum of the weighted overlaps based on the 
first k nodes in both rankings; experimental results we 
shall present in Section |3.9.3| correspond to k fixed to 
the 25% of the ranking lists being compared (denoted 
as F@25%). Note that the F score is in the interval 
[0,1], where 1 means total agreement and 0 means total 
disagreement. 


3.9.3 Results 

We focus our evaluation of time-aware LurkerRank meth¬ 
ods on the Flickr and Instagram datasets. A major rea¬ 
son for this choice is that we wanted to evaluate our 
proposed methods against snapshots extracted from a 
social (followship) network as well as from an interac¬ 
tion network. The former scenario was evaluated thanks 
to the timestamped followship information that is only 
available in Flickr. For the latter scenario we used the 
timestamped information about favorite-markings avail¬ 
able from Flickr, and the timestamped information about 
comments available from Instagram; note that this al¬ 
lowed us to evaluate the performance of the methods 
over snapshot graphs that correspond to two types of 
interactions, i.e., either “likes” or comments. Note also 
that we left FriendFeed out of consideration because 
it is less rich than Instagram in terms of timestamped 
information about comments; moreover, comments in 
FriendFeed concern user interactions corresponding to 
a relatively smaller portion of social graph than in In¬ 
stagram (cf. Section [T^ . 

Table[^shows performance results obtained on month¬ 
ly snapshots of the Flickr followship network. Here we 
left Ts-LR out of consideration since cumulative snap¬ 
shots make more sense than transient snapshots when 
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Table 4 Kendall-tau correlation and Fagin’s intersection per¬ 
formance w.r.t. DD on monthly, transient snapshots of the 
Flickr interaction network: comparison between Ts-LR against 
LR and T-Rank. (Bold values correspond to the best perfor¬ 
mance per assessment criterion.) 


snapshot 
(end time) 

T 

F@25% 

Ts-LR 

LR 

T-Rank 

Ts-LR 

LR 

T-Rank 

2006-10-05 

0.156 

-0.014 

-0.154 

0.165 

0.138 

0.069 

2006-11-02 

0.169 

-0.023 

-0.167 

0.174 

0.133 

0.068 

2006-11-30 

0.167 

-0.022 

-0.159 

0.166 

0.138 

0.063 

2006-12-28 

0.155 

-0.003 

-0.147 

0.171 

0.133 

0.068 

2007-01-25 

0.169 

-0.025 

-0.16 

0.167 

0.134 

0.073 

2007-02-22 

0.167 

-0.018 

-0.159 

0.178 

0.14 

0.066 

2007-03-22 

0.164 

0 

-0.14 

0.158 

0.143 

0.073 

avg. 

0.164 

-0.015 

-0.155 

0.168 

0.137 

0.069 


Table 5 Kendall-tau correlation and Fagin’s intersection per¬ 
formance w.r.t. DD on monthly, cumulative snapshots of the 
Flickr interaction network: comparison between Te-LR against 
LR. (Bold values correspond to the best performance per as¬ 
sessment criterion.) 


snapshot 
(end time) 

T 

F@25% 

Te-LR 

LR 

Te-LR 

LR 

2006-10-05 

0.156 

-0.014 

0.165 

0.138 

2006-11-02 

0.177 

-0.024 

0.175 

0.141 

2006-11-30 

0.179 

-0.030 

0.173 

0.140 

2006-12-28 

0.185 

-0.033 

0.179 

0.141 

2007-01-25 

0.191 

-0.046 

0.177 

0.141 

2007-02-22 

0.197 

-0.050 

0.175 

0.139 

2007-03-22 

0.200 

-0.050 

0.176 

0.138 


fellowship relations are taken into account. In other 
terms, since the set of followships in a network grows 
progressively (unless unfollowing actions are permit¬ 
ted), it is unfair to consider transient snapshots which 
would ignore all the relations created before the selected 
time windows. From the table, we observe that Te-LR 
always obtains a higher correlation with DD than LR, 
with gains in terms of Kendall-tau ranging from 0.124 
to 0.205, and gains in terms of Fagin’s intersection up 
to 0.029. It should be noted that even though LR shows 
negative Kendall-tau correlation w.r.t. DD for more re¬ 
cent (i.e., cumulatively aggregated) snapshots, Fagin’s 
intersection values are not so distant from the ones ob¬ 
tained by Te-LR. This would indicate that Te-LR corre¬ 
sponds to a superior lurker-ranking model when applied 
to all users in the network as well as only to the most 
prominent ones as lurkers in the network, while the lat¬ 
ter would be suboptimally detected by LR. 

Tablej^and Tablej^still focus on Flickr, however on 
a different scenario in which the ranking methods are 
applied over snapshots of the Flickr interaction net¬ 
work. Results from Table which correspond to tran¬ 
sient snapshots, show the better performance of Ts-LR 
against LR and T-Rank, according to both assessment 
criteria (with average Kendall-tau correlation of 0.164 
and average Fagin’s intersection of 0.168). In particu¬ 
lar, Ts-LR outperforms T-Rank, with an average gain 
of 0.319 Kendall-tau of 0.100 Fagin’s intersection. Also, 


Table 6 Kendall-tau correlation and Fagin’s intersection per¬ 
formance w.r.t. DD on monthly, transient snapshots of the 
Instagram interaction network: comparison between Ts-LR 
against LR and T-Rank. (Bold values correspond to the best 
performance per assessment criterion.) 


snapshot 
(end time) 

r 

F@25% 

Ts-LR 

LR 

T-Rank 

Ts-LR 

LR 

T-Rank 

2012-07-04 

0.367 

0.145 

0.235 

0.227 

0.170 

0.192 

2012-08-01 

0.120 

0.107 

0.179 

0.244 

0.221 

0.103 

2012-08-29 

0.197 

0.153 

0.140 

0.270 

0.234 

0.202 

2012-09-26 

0.255 

0.211 

0.111 

0.246 

0.205 

0.120 

2012-10-24 

0.200 

0.166 

0.126 

0.237 

0.205 

0.148 

2012-11-21 

0.254 

0.234 

0.137 

0.185 

0.154 

0.108 

2012-12-19 

0.231 

0.201 

0.119 

0.230 

0.201 

0.180 

2013-01-16 

0.236 

0.211 

0.095 

0.257 

0.235 

0.126 

2013-02-13 

0.253 

0.221 

0.110 

0.234 

0.210 

0.155 

2013-03-13 

0.203 

0.166 

0.179 

0.195 

0.180 

0.154 

2013-04-10 

0.249 

0.225 

0.137 

0.190 

0.188 

0.167 

2013-05-08 

0.282 

0.253 

0.099 

0.249 

0.235 

0.144 

2013-06-05 

0.282 

0.256 

0.139 

0.219 

0.214 

0.159 

2013-07-03 

0.247 

0.216 

0.136 

0.227 

0.211 

0.135 

2013-07-31 

0.218 

0.201 

0.157 

0.191 

0.190 

0.131 

2013-08-28 

0.236 

0.207 

0.143 

0.201 

0.181 

0.165 

2013-09-25 

0.268 

0.248 

0.132 

0.218 

0.202 

0.130 

2013-10-23 

0.209 

0.191 

0.103 

0.183 

0.173 

0.156 

2013-11-20 

0.234 

0.217 

0.093 

0.231 

0.216 

0.143 

2013-12-18 

0.226 

0.211 

0.115 

0.229 

0.224 

0.126 

avg. 

0.238 

0.202 

0.134 

0.223 

0.203 

0.147 


Ts-LR achieves higher Kendall-tau correlation w.r.t. DD 
than LR (up to 0.194, with an average gain of 0.179), 
while the difference in terms of Fagin’s intersection is 
smaller. While Kendall-tau values obtained by LR and 
T-Rank are negative, it should be noted that T-Rank 
also shows very low Fagin’s intersection values (always 
below 0.1) while LR maintains a certain intersection 
with the top of DD (average Fagin’s intersection of 
0.137). Analogous conclusions can be drawn from the 
evaluation of Te-LR and LR over monthly, cumulative 
snapshots of the Flickr interaction network (Table [^. 
Again, a significant difference in terms of Kendall-tau 
correlation is observed between the performance of Te- 
LR and LR (average gap of 0.250, with LR always show¬ 
ing negative correlation w.r.t. DD), while Fagin’s inter¬ 
section values of LR is relatively lower than the ones 
obtained by Te-LR (average gain of only 0.038 in favor 
of Te-LR). 

Results over the Instagram monthly snapshots are 
reported in Table and TableAgain, Ts-LR and Te- 
LR always perform better than competitors, in the cor¬ 
responding evaluation scenarios. Moreover, compared 
to the results obtained on Flickr, the performance of LR 
is generally closer to that of the time-aware LurkerRank 
algorithms. In the transient evaluation case (Table |^, 
Kendall-tau correlation is positive for all methods, with 
Ts-LR showing higher correlation w.r.t. DD than T- 
Rank (average gain of 0.104), and similar correlation 
values when compared to LR (average gain of 0.036). 
An analogous situation can be depicted when consider¬ 
ing Fagin’s intersection scores, with gains up to 0.141 
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Table 7 Kendall-tau correlation and Fagin’s intersection per¬ 
formance w.r.t. DD on monthly, cumulative snapshots of the 
Instagram interaction network: comparison between Te-LR 
against LR and T-Rank. (Bold values correspond to the best 
performance per assessment criterion.) 


snapshot 
fend time) 

T 

F@25% 

Te-LR 

LR 

Te-LR 

LR 

2012-07-04 

0.366 

0.145 

0.232 

0.170 

2012-08-01 

0.235 

0.112 

0.232 

0.203 

2012-08-29 

0.239 

0.074 

0.210 

0.123 

2012-09-26 

0.233 

0.090 

0.185 

0.116 

2012-10-24 

0.211 

0.097 

0.187 

0.127 

2012-11-21 

0.203 

0.101 

0.180 

0.128 

2012-12-19 

0.188 

0.094 

0.173 

0.131 

2013-01-16 

0.175 

0.092 

0.178 

0.146 

2013-02-13 

0.160 

0.088 

0.174 

0.148 

2013-03-13 

0.148 

0.079 

0.166 

0.144 

2013-04-10 

0.143 

0.079 

0.166 

0.152 

2013-05-08 

0.137 

0.078 

0.163 

0.155 

2013-06-05 

0.126 

0.072 

0.163 

0.156 

2013-07-03 

0.123 

0.076 

0.163 

0.159 

2013-07-31 

0.115 

0.073 

0.163 

0.162 

2013-08-28 

0.112 

0.077 

0.166 

0.166 

2013-09-25 

0.105 

0.075 

0.165 

0.167 

2013-10-23 

0.098 

0.075 

0.171 

0.177 

2013-11-20 

0.091 

0.075 

0.179 

0.187 

2013-12-18 

0.088 

0.079 

0.182 

0.191 


w.r.t. T-Rank and up to 0.057 w.r.t. LR. Differences 
between Te-LR and LR are even smaller when looking 
at the cumulative case (Table [^, with always decreas¬ 
ing Kendall-tau gains which range from the 0.221 of 
the first snapshot, to the 0.01 of the last one. Fagin’s 
intersection values are very similar in most cases (max¬ 
imum gain of 0.087), with LR performing comparably 
to or better than Te-LR in the last snapshots. An ex¬ 
planation might be found according to a fact that we 
already observed in Section [373l that is, lurkers are more 
prone to perform actions like “favorites” or “likes” than 
to comment posts. Therefore, the “favorite/like” type 
of interaction would act as a better discriminant than 
“comment” in capturing the lurker dynamics via a time- 
varying graph model. 


4 Related Work 


4.1 Lurking in social networks 


Research studies in social science and human-computer 
interaction have scrutinized the various definitions of 
lurking, analyzed the motivational factors for lurking, 
and devised the main strategies for de-lurking. For in¬ 
stance, Soroka and Rafaeli [S3] investigated relations 
between lurking and cultural capital, i.e., a member’s 
level of community-oriented knowledge, while lurking 
was conceptualized in mM in terms of the users’ 
boundary spanning and knowledge brokering activities 
across multiple community engagement spaces. The im¬ 
plications behind lurking were also analyzed from a 
group learning |19j . peripheral participation |29j . and 
epistemological m perspectives. 

Understanding lurkers in OSNs refers to a scenario 
that has remained quite unexplored in computer science 
until recently. Fazeen et al. [33] addressed classification 
of the various actors in an OSN (i.e., leaders, spammers, 
associates, and lurkers). However, in that work the lurk¬ 
ing problem is treated marginally, and in fact lurking 
cases are left out of experimental evaluation. Similarly, 
Lang and Wu [36] analyzed various factors that influ¬ 
ence lifetime of users, also distinguishing between active 
and passive lifetime. In this regard, a number of fea¬ 
tures is suggested to promote usage among members 
of OSNs like Twitter and Buzznet. Moreover, while ex¬ 
amining to what extent active and passive lifetime are 
correlated, the authors observed that the study of pas¬ 
sive lifetime requires to know the user’s last login date, 
which is however unavailable for many OSNs includ¬ 
ing Twitter. Besides our work on lurker detection and 


ranking |531[35] (previously discussed in Section 2.1), 
in m we started investigating how lurker behaviors 
change over time. In that work we provided a prelimi¬ 
nary characterization of the lurking dynamics in terms 
of four out of the seven research questions that we have 
addressed in Section |3j 


As a final remark, we observe that Te-LR has differ¬ 
ent overall behavior in Flickr and in Instagram time- 
evolving graphs, which corresponds to two diverse time- 
spans, i.e., 7 months in Flickr against 20 months in 
Instagram. In particular, for more recent snapshots, 
ranking performance of Te-LR is generally increasing in 
Flickr and decreasing in Instagram, especially in terms 
of Kendall-tau. This would suggest a certain sensitivity 
of Te-LR to long timespans, which might negatively af¬ 
fect the Te-LR performance, even yielding worse results 
than the basic (i.e., time-unaware) LurkerRank. 


4.2 Time-aware PageRank 

Given the popularity of PageRank among researchers in 
web searching of authoritative sources, most solutions 
for time-aware ranking have been developed in the past 
years by resorting to PageRank-style methods. Here we 
briefly discuss some of the most relevant studies, which 
share intuitions with our approach. 

One of the earliest methods that leverage the tem¬ 
poral dimension in authority ranking is TimedPageR- 
ank [53j . This is basically a weighted PageRank applied 
to a citation network in which the strength of every edge 
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(citation) is weighted by an exponential decay function 
of the citation age. An aging-factor is also introduced 
to linearly penalize the scores w.r.t. the age of a pub¬ 
lication. TimedPageRank adopts a graph model that 
represents a static picture of the network at a given 
time. By contrast, EventRank [35] takes into account 
the sequence of events and can track changes in ranking 
over time. Originally developed for a (email) commu¬ 
nication network, EventRank utilizes potential flow to 
model the exchange of messages in a cumulative ranking 
fashion. As mentioned in the Introduction, our time- 
aware lurker ranking methods also handles either ap¬ 
proach (i.e., static or cumulative) to ranking. 

Our proposed time-aware lurker ranking methods 
also share with T-Rank |7] the idea of measuring fresh¬ 
ness and activity aspects, for both individual users and 
their relations. T-Rank runs on a graph model in which 
nodes and edges are annotated with discrete temporal 
information, based on creation, deletion and modifica¬ 
tion timestamps of the items. A temporal window of 
interest and a tolerance interval are defined: the first 
represents the temporal range of interest to the user 
(e.g., duration of an event), while the latter represents 
a temporal window which surrounds the window of in¬ 
terest (e.g., the discussion that precedes and follows the 
event). The temporal aspects of network evolution are 
considered through the definition of freshness and ac¬ 
tivity functions. The freshness function depends on the 
time when a web page or link was last updated: it is 
maximal if a page or a hyperlink is updated with regard 
to the user’s temporal interest, and decreases linearly 
with the distance to the temporal window of interest. 
The activity function reflects the rate of updates of a 
page’s content and its incoming links, and it is simply 
defined as the sum of the freshness values of modifica¬ 
tion timestamps within the tolerance interval. The au¬ 
thors define two PageRank-based algorithms, namely 
T-Rank and T-Rank light. The latter takes into ac¬ 
count freshness and activity functions only for skewing 
the random jump probabilities during the random walk 
process, while T-Rank skews both the random jump 
probabilities and the transitions probabilities based on 
the temporal functions. 

However, we provide different technical solutions that 
better fit our task of lurker ranking; in particular, we 
provide a more refined notion of activity which allows 
for modeling the significant variations in the lurking 
score trends. Moreover, unlike T-Rank, we also define 
cumulative formulations of those aspects in order to 
enable a time-evolving graph model for the ranking of 
lurkers. In Section |3.9.3[ we have presented an experi¬ 
mental comparison with T-Rank. 


4.3 User activity and interaction dynamics 

In this section we mention recent works which, while 
not addressing lurking problems, are somehow related 
to ours since they cope with some of the topics we have 
covered in Section namely: user activity and interac¬ 
tion in temporal networks, time series and cluster anal¬ 
ysis in social networks, topical evolution, newcomers, 
preferential attachment in directed networks, and re¬ 
sponsiveness. 

User activity can be influenced by many factors, in¬ 
cluding personal traits, communicative and social vari¬ 
ables, attitudes, and social influence [33]. Macropol et 
al. |40j considered timing and activity information avail¬ 
able for users to uncover correlations between topic- 
based user activity levels and changes in sentiment. 
Since user activity trends and behavioral patterns also 
depend on the structure and features of an OSN, they 
have often been studied in the context of specific OSNs. 
For instance, Arnaboldi et al. |3| examined the dynamic 
processes of ego networks and personal social relation¬ 
ships in Twitter. Wang et al. [60| presented a detailed 
analysis of the dynamics of Quora, which integrates a 
question-answering system into an OSN, studying its 
user-topic graph, social graph and related questions 
graph. The relation between user activities and net¬ 
work structure can also determine the success or de¬ 
cline of an OSN, as studied in HZj, where Garcia et 
al. analyzed the social resilience phenomenon in five 
online communities (i.e., Friendster, Livejournal, Face- 
book, Orkut and Myspace). Modeling engagement dy¬ 
namics in social graphs is also the focus of the study by 
Malliaros and Vazirgiannis [41] . 

Time series analysis represents a key tool for mod¬ 
eling and mining temporal graphs, and has often been 
used to support clustering or classification tasks in OSNs, 
For instance, Yang et al. [51| proposed a method for 
classifying Twitter users based on the content of their 
tweets. The method maps users to time series; more 
in detail, tweet features are modeled as time series in 
order to amplify latent periodicity patterns in user com¬ 
munications. Caravelli et al. m introduced a holistic 
dynamic clustering framework for identifying evolving 
groups and alliances across multiple time granularities 
in dynamic graphs. 

Considering topic modeling of social media content 
in addition to the time dimension, Wagner et al. |59j ex¬ 
plored the impact of coupling content with user profile 
data on the development of the users’ topical expertise. 
Hu et al. [35] defined a feature-based topic model and 
a social-based topic model in the context of large-scale 
user-generated documents available from OSN websites. 
In the context of online analysis of text streams, Saha 
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and Sindhwani [50] proposed to process incoming data 
together with recently seen documents over a short time 
window, with the purpose of producing evolving and 
emerging topic sets. Narang et al. [33] integrated text 
clustering, topic similarity detection and WordNet to 
discover time evolving conversations around topics in 
OSNs that do not have explicit discussion threads (e.g., 
Twitter). 

Other specific aspects that we have analyzed in Sec¬ 
tion concern the role of newcomers, the responsive¬ 
ness dynamics, and the preferential attachment in di¬ 
rected OSNs. Concerning newcomers, recent attention 
has been paid to the impact of community diversity in 
the engagement of newcomers |48j , the design of person¬ 
alized engagement strategies [31], and the antecedents 
of newcomers’ participation behavior [SB]. Also, Allaho 
and Lee |2] found a tendency of collaboration between 
masters and newcomers in software development online 
communities. Responsiveness is central to gain useful 
insights into how users in an OSN interact with one an¬ 
other. Allaho and Lee |3] considered the increase in re¬ 
sponsiveness for recommending experts in collaboration 
networks like Github.com. On et al. m investigated re¬ 
sponsiveness coupled with engagingness behavior mod¬ 
els in email networks, mainly for tasks of reply order 
prediction. Gao et al. proposed an extended rein¬ 
forced Poisson process model with time mapping pro¬ 
cess to capture the dynamics underlying retweets and 
eventually predict the future popularity of microblogs. 
The preferential attachment phenomenon has long re¬ 
ceived great attention in network science. Focusing on 
directed networks, we observe that some studies have 
investigated this concept as a growth model for both 
social media networks (e.g., Flickr [32]) and collabora¬ 
tion networks (e.g., Wikipedia m)- Moreover, Kunegis 
et al. [55] performed an empirical study of preferential 
attachment in OSNs for which temporal information 
is available. They showed that most networks follow a 
nonlinear preferential attachment model, whose expo¬ 
nent depends on the type of network considered. 


4.4 The challenge of latent user activities 


In Section 3.9.1 we have raised an important issue in the 
evaluation of lurker ranking problems, which is related 
to the difficulty of gathering information about latent 
or silent interactions among users. These are typically 
performed via browsing, reading, or watching activities 
in the OSN environment. 

There has been a number of relatively recent studies 
that have examined latent interactions. By using survey 
data obtained by nearly 1200 recruited users, Burke et 
al. [T3] have studied user interactions on Facebook to 


understand relations between user-targeted visible ac¬ 
tions (like those related to wall posts, comments, photo 
tagging, etc.), also called directed communication, and 
consumption actions, with loneliness and social capital 
bonding. They found that directed communication is 
associated with higher social capital bonding and lower 
loneliness, whereas social capital bridging and loneli¬ 
ness will increase with consumption. In a later work [S], 
the Facebook team further focused on the limitations 
of visible action indicators (like feedbacks and friend 
counts) in supporting the analysis of the size and pro¬ 
file of a user’s audience. 

A useful tool for modeling both visible and latent 
activities of users is represented by clickstream data. 
Ghatterjee et al. m proposed to trace user clickstream 
data to model all activities of users, suggesting main 
implications in the design of OSN websites and adver¬ 
tisement placement. Benevenuto et al. |B] provided an 
in-depth analysis of traffic and session patterns of user 
workloads. Their study is based on clickstream data 
that were collected through a Brazilian OSN-aggregator 
website. They examined how frequently users connect 
to OSN sites, how long users sessions are, how inter¬ 
request time and inter-session time data are distributed, 
and how the physical distance impacts on user interac¬ 
tions. Notably, Benevenuto et al. also discussed the op¬ 
portunity of exploring clickstream data to understand 
silent interactions: in this regard, they found that brows¬ 
ing is the most dominant behavior in Orkut (above 
90%), and that the number of friends interacting with 
a user increases by an order of magnitude compared to 
only considering visible activities of users. 

Based on data collected from the largest OSN in 
Ghina, RenRen, Jiang et al. [32] conducted an extensive 
analysis on latent interactions focusing on profile visits. 
Latent interaction graphs were found to have proper¬ 
ties that are different from both those of social graphs 
and visible interaction graphs. Latent interactions have 
extremely low reciprocity, despite the fact that RenRen 
allows its users to see who recently visited their profile. 
Gompared to visible interactions, latent interactions are 
more prevalent and more evenly distributed across a 
user’s friends. A significant part of profile visits comes 
from non-friends of a user, while the majority of visitors 
do not browse the same profile twice. Moreover, profile 
popularity does not seem to be strictly correlated with 
the frequency of updates. 

We envisage that the lessons learned from the above 
mentioned studies concerning latent activities of OSN 
users, could be helpful to enhance the understanding 
of lurking behaviors. In particular, involving informa¬ 
tion on latent activities extracted from profile visits and 
clickstream data, would pave the way to new opportuni- 
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ties of evaluation of lurker ranking problems. In this re¬ 
gard, we would like to stress again that our time-aware 
LurkerRank methods can equally deal with visible as 
well as latent information-consumption activities, and 
that they are suitable for new scenarios of data-driven 
ranking evaluation. 


5 Conclusion 

In this work, we advanced research on lurking in OSNs 
in a twofold manner. We studied the dynamics of lurk¬ 
ing behaviors in OSNs, by performing a rigorous anal¬ 
ysis aimed to understand how lurkers relate to other 
types of users and how patterns of lurking behaviors 
evolve over time. More in detail, we compared lurkers 
and inactive users as well as lurkers and newcomers, in¬ 
vestigated preferential attachment between lurkers and 
active users, studied the lurkers’ responsiveness to oth¬ 
ers’ actions, performed a cluster analysis of the lurk¬ 
ing trends over time and a topic-sensitive analysis of 
evolving patterns of lurking behavior. We also overcome 
the time-related limitation of previous formulations of 
lurker ranking methods. In this regard, we developed 
measures related to freshness and activity trend, both 
for individual users and for interactions between users. 
Such measures were used as key elements in time-aware 
LurkerRank methods, following either a time-static or 
a time-evolving graph model. Results have shown the 
significance of our time-aware LurkerRank methods. 
We have finally discussed open issues, concerning new 
opportunities of evaluation of lurker ranking problems 
based on the exploitation of user latent activities. 
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